This Page Is Inserted by IFW Operations 
and is not a part of the Official Record 



BEST AVAILABLE IMAGES 

Defective images within this document are accurate representations of 
the original documents submitted by the appHcant. 

Defects in the images may include (but are not hmited to): 

• BLACK BORDERS 

. TEXT CUT OFF AT TOP, BOTTOM OR SIDES 

• FADED TEXT 

• ELLEGroLE TEXT 

. SKEWED/SLANTED IMAGES 

• COLORED PHOTOS 

• BLACK OR VERY BLACK AND WHITE DARK PHOTOS 

• GRAY SCALE DOCUMENTS 

IMAGES ARE BEST AVAILABLE COPY. 



As rescanning documents will not correct images, 
please do not report the images to the 
Image Problem Mailbox. 



data access memory usage less frequent - Researchlndex document query Page 1 



Clt^SfNIf Find: 




Searching for PHRASE data access memory usage less frequent. 

Restrict to: Header TjtLe Order by: Expe cted citation s Hubs Usage Date Try: Amazon B&N, Google (Rl) Google 
(Web) CSB DBLP 

No documents match Boolean query- Trying non-Boolean reSevance query. 
1000 documents found. Retrieving documents... Order: relevance to query. 

Ada ptive Runtime Support for Dire ct Simulation Mo nte Carlo.. - IVIoon, Saltz ( 1994) {Correct). (9 citations) 
and Direct Simulation Monte Carlo (DSMC) codes, data access patterns may vary from time step to time 
Direct Simulation Monte Carlo (DSMC) codes, data access patterns may vary from time step to time step. 
Simulation Monte Carlo Methods on Distributed Memory Architectures Bongki Moon Joel Saltz Institute 

ftp.cs.umd.edu/pub/papers/papers/ncstrLumcp/CS-TR~3427/CS~TR-3427.psZ 

A pplication-Co ntroll ed Physi cal Memory using External Page-Cach e.. - Harty (1992) {Co.rrect| ( 81 citations) 
applications such as scientific simulations and database management systems will require more 
mapping, an application can optimize for efficient access based on the system memory organization and the 
Application-Controlled Physical Memory using External Page-Cache Management Kieran 

www.es. berkeley.edu/'-brewer/cs262/hc.ps 

Interprocedural Array Data-Flow Analysis for Cache Coherence - Choi. Yew (1995) {Corred) (1 citation) 
Interprocedural Array Data-Flow Analysis for Cache Coherence Lynn Choi y 
We also propose a condition for a stale access, which identifies the memory reference sequence 
expansion and increase in its compilation time and memory requirements. In this paper, we introduce an 

polaris.cs.uiuc.edu/reporls/1427.ps.g2 

Dvnamic Word Problems - Frandsen. Miltersen. Skyum (1993) (Correot) (5 citations) 

monoid. We consider the problem of implementing a data type containing a vector x =x1 x2 

complexity of a computation is the number of cells accessed in the random access memory containing the data 

the number of cells accessed in the random access memory containing the data structure during the 

wwy^,bJlcs.dk/'-gudmund/Documents/dwp,ps 

Scalable Caching Techniques for a Weakly Coherent Memon/ - Zamanifar. Nash. Dew (1995) (CorrecQ 
computational model, with the ability to exploit data locality for good performance. Today, this is 
However, it is well known that coordinating data access and implementing synchronisation are difficult 

agora. leeds.ac.uk/scs/doc/reports/1995/95_34.ps.2 

Global Compiler Analysis for Optimizing Tuplespace.. - James Fenwick (1996) tCorrect) 
Linda compiler, we have developed and implemented a data flow framework which statically estimates the 
However, the properties of associative access and uncoupled communication that give rise to 
challenges, especially on distributed memory systems. This paper provides concrete steps 

wwv^.cis.udel.fidu/-poliock/papers/pdcs96.ps 

On-Line Prediction of Multiprocessor Memon/ Access., - Sakr. Giles. Levitan.. (1996) jCorre^l}. (1 citation) 
in the IN launch time, the time to transmit the data into the IN and fly time, the time needed for the 
On-Line Prediction of Multiprocessor Memory Access Patterns M.F. Sakr 1 ,2 C. L. Giles 1 ,5 
On-Line Prediction of Multiprocessor Memory Access Patterns M.F. Sakr 1,2 C. L. Giles 

wwy^.neci,nec.com./--gi!es/papers/iCNN96.mult3processor.prediction.ps.Z 

Web Based Parallel/Distributed Medical Data Mining.. - Kargupta. Stafford.. iCorrecQ 

Web Based Parallel/Distributed Medical Data Mining Using Software Agents Hillol Kargupta, 

Agents) that uses software agents for local data accessing and analysis and a web based interface for 

However ft is easily portable to any distributed memory machine provided that MPI is operational on this 

wvw.eecs.wsu.edu/~hilioi/pubs/padmaMed.ps 

Programmino In Vienna Fortran - Chapman. Mehrotra. Zima (1992) {Correct). (1 18 citations) 
memory machines requires a careful distribution of data across the processors. Vienna Fortran is a 

fLp.gmd.de/guests./hpf-europe/vftn-papef.ps.Z 

Information Management in Process-Centered.. - Barohouti.. (1995) tCorrect) 

PSEEs include a repository that stores product data or process enactment data or both. Different PSEEs 

tokio.dbis.informatik.uni"frankfurt.de/REPORTS/GOODSTEP/GoodStepReport023.ps.gz 



Inteoratin g Temp oral, Real-Time, and Ac tiv e Databases - Ramamritham.. ( 1996) (Cermet). (3 ci tations) 
Integr ting emp r I, el- ime, nd Active t b ses Krithi m mrith m, ju iv s nk r n, hn 
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Can High Bandwidth and Latency Justify Large Cache Blocks.. - Bianchini. LeBlanc (1994) /j5ao:ecD (1 citation) 
multiprocessors use hardware caches to keep data close to the processors that need it, and thereby 
behavior of application programs and the remote access bandwidth and latency of the machine. Several 
the performance of coherent caches in shared-memory multiprocessors is the choice of block size. 

ftpxs JOchester.edu/pub/pa pers/systems/94.tr486.Can_high_bandwidth_and_iatencyJustify_iarge„caGh 

Efficient Detection of All Pointer and Array Access Errors - Austin. Breach. Sohi (1993) iQojjec^). (23 citations) 
overheads range from 130% to 540%with text and data size overheads typically below 100%c fl 1993 
Efficient Detection of All Pointer and Array Access Errors Todd M. Austin Scott E. Breach Gurindar 

ftp.cs.wisc.edu/tech-repoi1s/reports/93/tr1 1 SZ.ps.Z 

Dynamic Access Ordering for Symmetric Shared-Memory Multiprocessors - McKee (1994) {Correct), 
to the poor temporal and spatial locality of their data accesses. Moreover, the nature of memories 
Dynamic Access Ordering for Symmetric Shared-Memory 

flp.cs,virginia,edu/pub/techrepofts/CS-94Tl4.ps.Z 

Memory Scalability in Constraint-Based Multimedia Style.. - Cumaranatunge. Munson (1998) (Correct) 
syntax trees are very large and the constraint data for a medium-sized source file can easily consume 
is constraint by its left sibling's Y attribute. AccessOp(Y) LeftSib )VertPos:Y Figure 3: Inverse path 
April 1998 (provisional paper July 27, 1997) Memory Scalability in Constraint-Based Multimedia Style 

wwv/.cs. uwm.edu/faculty/munson/pubs/ep98.ps 

Table-Lookup Approach for Compiling Two-Level Data-Processor.. - Kuei-Pinq Shih (1997) iCorrecQ 
Table-Lookup Approach for Compiling Two-Level Data-Processor Mappings in HPF Kuei-Ping Shih y 

axp1.c^ie.ncu.edu,tw/ftp/pubf'tech~repoft/1997/./steven--LCPC97f,ps.g^ 

Correction of a Memory Management Met hod for Lock-Free Data.. - Mi chael. Scott (1995 ) .(CpirecQ (5 citations ) 
of a Memory Management Method for Lock-Free Data Structures Maged M. Michael Michael L. Scott 
structures, processes have to synchronize their access to them. Mutual exclusion locks are the most 
Correction of a Memory Management Method for Lock-Free Data Structures 

hypatia.dcs,qmw.ac.uk.^data/edu/cs. rochester.edu/systems/95Ar599iv1emory_m3nage 

Fortran 9QD/HPF Compiler for Distributed Memory.. - Bozkus.. (1993) (Correct) (2 citations) 
April 1, 1993 Abstract Fortran 90D/HPF is a data parallel language with special directives to 
distribution and communication for non-local data access. There has been significant research in 
Fortran 90D/HPF Compiler for Distributed Memory MIMD Computers: Design, Implementation, and 

fip.ds,ufl,edu/pub/faculty/ranka/cDmpljer_sc93.ps.Z 

Shared Memory NUMA Programming on l-WAY - Nieplocha. Harrison (1996) (Correct) (8 citations) 
(NUMA) programming model [1-3] and reference data in blocks to increase data locality in order to 
the Global Array shared-memory nonuniform memory-access programming model is explored on the l-WAY, 
1 Shared Memory NUMA Programming on l-WAY J. Nieplocha and R. J. 

ftp.pnLgov/pub/permanent/'giobaii^way.ps,Z 

Effectiveness of Message Strip-Mining for Regular and.. - Akivoshi Wakatani (1994) ICorcecQ 12 citations) 
implement parallel algorithms by distributing large data structures across a multicomputer system. To hide 
(regular communication) and executor for indirect access (irregular communication)and have achieved 
to make a program executable on any distributed memory multicomputer. HPF also allows use of expensive 

wwvif.cse.ogi^edu./Sparse/paper/wakatani.pdcs. 94.ps 
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pC++ , Charm++ and Orca: Lan g ua g es for Par allel Prog rammin g - Niemiec (1993) {CoLrect| 
in parallel computing. The need for functional and data parallelism is recognized and specifically 
error. When a currently executing thread wants to access a data unit of type sync which is not defined, it 
and the given support for distributed memory machines. Because of these criteria, concurrent 

www.npac.syr.edu/projects/hpsin/doc/ccpp,ps 

An Analytical Approa ch to File Prefetching - Lei (1997) {Correct). (33 citations) 

hopefully in advance of the actual need for the data. Prefetched data is then placed in the client's 

is an effective technique for improving file access performance. In this paper, we present a file 

www.mcLcsxolumbia.edu/papers/usenix97.ps.gz 

Discover y of numerical dependencies in form of rational.. - Kiselev . Arseniev Xvoj.recl) 
class [1 , 2] Searching for structure hidden in data PolyAnalyst builds and tests hypotheses about 
primitives. Information in databases is accessed via special data access primitives which also 
compromise is achieved due to choice of more or less narrow set of formulae in which search is 

www,megaputerxom/DOWN/PA!smis6.PS 

The RD13 DAG Syste m and the Object Mana g ement Workbench - Bob Jones .(Correct), 
as used in the Rbl3 research project for data acquisition systems of high energy physics 

rd13doGxefr^ch/public/doc/poslscript/RD1 3_sumrriefSchooi95Jones.ps 

Schedulinc Access To Temporal Data In Real-Time Databases - Xionq. Sivasankaran.. (1997) (Correct) (3 citations) 
1 Scheduling Access To Temporal Data In Real-Time Databases Ming Xiong, Rajendran 

www • CCS . OS . u ma ss . ed uhsi m/rtd b - oh a pte r96 . ps | 

Mechanisms and Interfaces for Software-Extended Coherent Shared.. - Chaiken (1994) .(Cofi.e.d) (3 citations) 
to optimize accesses to widely-shared, read-only data and improves one benchmark's performance by 22% 

ftp.cagJcs,mtLedu/pub/papers/chaiker^disserl-1-10,ps.Z 

Cautious. Machine-Independent Performance Tuning for.. - Talbot. Bennett. Kellv (Correct) 

to ensure that CPUs do not use stale cached data. In addition to the overheads of maintaining 

links. These all conspire to increase memory access times, and hence slow down the execution time of 

Machine-Independent Performance Tuning for Shared-Memory Multiprocessors Sarah A. M. Talbot, Andrew J. 

wwvv-ala.docJc.ac.uk/-phjk/Publications/CautiousMachinelndependent., EuroPar97.ps.gz 

Energy-Efficient Index Replication for Wireless Data Broadcasting - Yon Dohn {Correctl 
Energy-Efficient Index Replication for Wireless Data Broadcasting Yon Dohn Chung Myoung Ho Kim 
all data stream must be read from the time of access request to the time until all requested data are 

dbsefverkatst.ac,kr/NEW/W3rehouse/./thesis„store/ydchung7.ps.gz 

Using Data Structures within Genetic Programmino - Lanodon (Correct) (5 citations) 
1 Using Data Structures within Genetic Programming W. B. 

is written so that it be independent of memory access implementation details. This is achieved by using 
some cases, provision of appropriately structured memory can indeed be advantageous to GP in comparison 

ftp.cs.bham.aauk'pub/authors/W.B.Langdon/papers/WBL.gp96.submitted.ps 

Fine-Grain Dataflow Model And Algorithms For Visualization Svstems - Song (1994) .(.CorrecQ 
Fine-Grain Dataflow Model And Algorithms For Visualization 

ftp.ncsa.uiuc.edu/ncsapubs/preprfnts/TR01 8. ps.Z 

On using Network Memory to I mprove the Performance of - loannidis, Markatos , (1997 ) (Cpjiect} 
ranging from CAD environment to large-scale databases. Unfortunately, adding transaction support to 

wvvwJcs.foFth.gr/arch-vls6/OS/papers/1997.TR190.Remote_memory_.RVM.ps.gz 

Asynchronous Version Advancement in a Distributed.. - Jaqadish. Mumick.. (Correct) 

Version Advancement in a Distributed Three Version Database H. V. Jagadish AT&T Laboratories 

wv/vv.rese rch. tic m/-mish /multiversi n/ synch ersi ning.ps.gz 
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Extending Locking Techniques to Improve Concurrent Database.. - Cesar Galindo-Leaaria (Correct) 
Extending locking techniques to improve concurrent database access Cesar Galindo-Legaria Fausto 
locking techniques to improve concurrent database access Cesar Galindo-Legaria Fausto Rabitti lEI-CNR 

ftpJnriair/assodations/ERCIM/research.jeports/ps/0495R036.ps 

A Host Inter face to the DTM Network - Ahlgren. Pin k. Lindgren.. (1992) (C.orrecf). (1 citation) 
segmenting and reassembling packets to and from the data units of the dtm. The software part of the 
port memory residing on the interface card and accessible over the SBus from the host cpu. The host 
The interface is based on a dual port memory residing on the interface card and accessible 

ftp.sics.se/pub/SICS-report5/Reports/StCS"R™92-01™SE.ps.Z 

The Zebra Striped Network File System - Hartman. Ousterhout (1993) (Correct) (1 1 8 citations) 
system that increases throughput by striping file data across multiple servers. Rather than striping each 

vAVW.cs.ari2ona.edu/people/jhh/papers/2ebra__tocs.ps 

Wait-Free Synchronization - Herlihy (1993) iCofiectl (1 00 citations) 

A wait-free implementation of a concurrent data object is one that guarantees that any process can 

others, and some memory locations may be slower to access. Await-free implementation of a concurrent data 

may be inherently faster than others, and some memory locations may be slower to access. A wait-free 

www.cs.brown.edu/courses/cs196a/topias.ps 

Understanding Lan gua ge Support for Irregular Parallelis m - Raghavachari. Rogers (1995) {Correct). (1 cit ation ) 
Abstract While software support for array-based, data-parallel algorithms has been studied extensively, 
x4.2 Distribution User Control p p p Replication Access Control p ?D p p TSM a default) 

ttp.cs. Princeton. edu/techreports/1 996/506. ps.Z 

Multi-level Partitioning and Scheduling under Local M emory.. - Qingyan Wang (1995) {CofxecQ 

and DSP applications. Due to the large amount of data handled by such applications, the optimization of 

www.cse.nd, edu/pub/Reports/l 995/tr-95-1 1 .ps.gz 

Implementation and Evaluation of Prefetching in the.. - Arunachalam.. (1996) {C.on;ect) 

be addressed to a certain extent, if the necessary data can be fetched from the disk before the I/O call 

that does a considerable amount of disk accesses. A major portion of the compute processors' 

three i860 processors) and 16 MBytes or more of memory. The nodes can operate individually or as a group 

yAvw.ece.nwu.edu/--meena/papers/ipps-ps 

On Using Intelligent Network Interface Cards to.. - Fiuczvnski. Martin. .. (1998) .tCrjrrect} (2 citations) 
on the network interface, and transfers video data arriving from the network directly to the region 
address space as the firmware, with low-overhead access to services and hardware, and don't require the 
the network directly to the region of frame buffer memory representing the applications window. As a result 

www.es. berkeiey.edu/-'rmartin/'papers/mef-nossdav98.ps 
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Mining Frequent Patterns without Candidate Generation - Han. Pel. Yin (1999) (Correct) (56 citations) 

have better chances of sharing nodes than less frequently occurring ones. Second, an FP-tree-based pattern 

ftp.fas.sfu. ca/pub/cs/han/pdf/sig modOO.pdf 

Random Texts Exhibit Zipfs-Law-Like Word Frequency Distribution - Li (1992) (Cormcn (1 5 citations) 
to have larger frequencies than those less frequently occurring. Nevertheless, it seems to be a puzzle 

linkage,rockefe[ier.edu/V'/li/pub/zipf ps 

A Code Compression System Based on Pipelined Interpreters - Hooqerbruqqe.. (1999) {Correct) (8 citations) 
occurring codewords in fewer bits than less frequently occurring codewords. The best known statistical 

einstein.et.tudeiftvnl/Hanh/philips-publications/compact.paper.pdf 

Missing F eature Theory In ASR: Ma ke Sure Yo u Miss The .. - de Veth, de Wet.. (1 999 ) .{Corred) ( 2 citations) 
are reliable estimators of the less frequently occurring feature values. As a consequence, it 

VAVw.dcs.shef,ac.uk/--tjupco/papers/develh.1999.1ps.gz 

Expressive Probability Models For Speech Recognition And.. - Russell (1999) (Correctj (1 citation) 
336 phonemes the last CDA result used less frequently occurring phonemes and had a size of 666. The 

wwy^.cs.berkeley.edu/'--russeil/papers/asru99-abstract.ps 

Recovering Documentation-to-Source-Code Traceabillty Links.. - Marcus. Malefic iCojjec^X 

of LSI, the term combinations which are less frequently occurring in the given document collection tend 

trident.mc3,kent.edu/'-amarcus/papers/i cse03.pdf 

Unknown - Algorithms For Lossless (Correct} 

number of bits than the codes used for less frequently occurring symbols. Naturally occurring images 

www.cise.ufl. edu/'-sahni/papers/loss[ess23.pdf 

Beauty is in the Genes of the Beholder - Harel. Ungen Sussman (1984) .(Correct). 

communication (1982)ln contrast, in the less frequently occurring forms of DNA, i.e. A-DNA and ZONA, the 

www.wisdom.weizmann.acJi/'-dharel/SCANNEDPAPERS/BeautylsinTheGenes.pdf 

Discovering actionable patterns in event data - Hellerstein. Ma. Pernq (Cgxrec^t) 

patterns that also address important but less frequently occurring phenomena. Furthermore, event data 

researchweb.watsonJbm.com/journai/sj/413/hellerstein.pdf 

Tsunami Generation By Submarine Mass Failure - Part li Case .05o„o:^.cD 

magnitude, with larger events typically occurring less frequently (Prior and Coleman, 1979 Edgers and 
with larger events typically occurring less frequently (Prior and Coleman, 1979 Edgers and Karlsrud, 
of these five mechanisms, in part because their occurrence is often concealed from view and in part 

www.oce.uri.edu/-gnili/tsunami-asce_part2.pdf 

Unknown - The Process That .{CojTecQ 

and these are coded e#ciently. The less frequently occurring patterns are coded in some less e#cient 

www.cs.colorado.edu/honies/jvviison/publicJitmi/papers/data conipression.ps 

Organising keywords in a web search environment: a.. - Ding. Chowdhury. Foo XCorre?^} 
results were obtained if only the less frequently occurring terms were clustered and if the more 

www.cs.vu,nl/'-ying/download/!SKO_Finaidraft.pdf 

Automatic Speech Recognition in Adverse Acoustic Conditions - de Wet de Veth. Cranen.. (1999) .{.CoiTecQ 
challenge1Jet.kun.nI/literature/dewet.1 999.2. ps 

Acoustic Pre-processing For Optimal Effectivity Of.. - de Veth. Cranen. de.. (1999) (ConBct) 
chalienge1.l8t.kun.nl/iiterature/deveth. 1999. 3. ps 
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chosen in order, e.g.to degrade the less frequently occurring operation. A read operation sinnply 

l2r.cs.uiuo.edu/'"danr/Papers/iinearJ,ps.gz 

Linguisticall y Engineered Tools for Speech Reco g nition.. - Van Ess-Dykema. Ries ( 1998 ) (Coo.'y.cl} 
significant error reduction in the less frequently occurring Dialog Acts ahd we report on the 

wernerira,iika.de/papers/speech/!CSLP98/ICSLP98-caroLps.9z 

Locating Special Events when Solving ODEs - Gladwell. Shampine (Correct) 

event function, and is cautious in other less frequently occurring circumstances. If we did not insist on 

cygnus.math.smu.edu/pub/gladweSi/smevents.ps.gz 

A Two stage process for accurate image segmentation - Campbell. Thomas, Troscianko (1997) {Correct} 
cluster represents 'vegetation'Other, less frequently occurring labels, are also present but are 

hypatia.dcs.qmw.ac.uk/data/uk/cs.bris,ac,uk.^1 997/1 997-W-1 .ps,gz 
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^ Hot cold optimization of large Windows/NT applications 
Robert Cohn, P. Geoffrey Lowney 

December 1996 Proceedings of the 29th annual ACM/IEEE international symposium on 
Microarchitecture 

Full text available: pdf(ii4 MB) Additional Information: MLQikation, abstract refererices, citi05§., LOdex 
Pubiisher Site 



terms 



A dynannic instruction trace often contains many unnecessary instructions that are required 
only by the unexecuted portion of the program. Hot-cold optimization (HCO) is a technique 
that realizes this performance opportunity. HCO uses profile information to partition each 
routine Into frequently executed (hot) and infrequently executed (cold) parts. Unnecessary 
operations in the hot portion are removed, and compensation code is added on transitions 
from hot to cold as needed. We evaluate HCO on a ... 



Keywords: optimization, profile,NT,register allocation 



2 



3 



SoMare.profiiJng.M.hoi.pa^^ 
Evelyn Duesterwald, Vasanth Bala 

November 2000 Proceedings of the ninth international conference on Architectural 

support for programming languages and operating systems, Volume 34 , 

28 Issue 5 , 5 

Full text available: ffi pdl?286.0-/ KB) Additional information: fujl citation, abstract, references, mm, lodex 
^ ' ^ terms 

Recently, there has been a growing interest in exploiting profile information in adaptive 
systems such as just-in-time compilers, dynamic optimizers and, binary translators. In this 
paper, we show that sophisticated software profiling schemes that provide highly accurate 
information in an offline setting are ill-suited for these dynamic code generation systems. 
We experimentally demonstrate that hot path predictions must be made early in order to 
control the rising cost of missed opportunity tha ... 

Softw.are„profiiJn 

Evelyn Duesterwald, Vasanth Bala 

November 2000 ACM SIGPLAN Notices, volume 35 issue ii 

Full text available: ^ Ddf(2.43 MB) Additional Information: full ciiation . abstrigict . references , index terms 

Recently, there has been a growing interest in exploiting profile information in adaptive 
systems such as just-in-time compilers, dynamic optimizers and, binary translators. In this 
paper, we show that sophisticated software profiling schemes that provide highly accurate 
information in an offline setting are ill-suited for these dynamic code generation systems. 
We experimentally demonstrate that hot path predictions must be made early in order to 
control the rising cost of missed opportunity tha ... 
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program phases for post-link optimization 

Ronald D. Barnes, Erik M. Nystrom, Matthew C. Merten, Wen-mei W. Hwu 

November 2002 Proceedings of the 35th annual ACM/IEEE international symposium on 

Microarchitecture 

Full text available: = , ..... ^ 

1^ P^'C^-^^ (VlbjrEjf Additional Information: Ml citation , abstract, references , index terms 

pubjlsher Sjte 

This paper presents Vacuum Packing, a new approach to profile-based program 
optimization. Instead of using traditional aggregate or summarized execution profile 
weights, this approach uses a transparent hardware profiler to automatically detect 
execution phases and record branch profile information for each new phase. The code 
extraction algorithm then produces code packages that are specially formed for their 
corresponding phases. The algorithm compensates for the incomplete and often 
incoheren .,, 

5 Performance analysis of ATM Banyan networks with shared queueing — part !!: 

correlated/unbajanced offered traffic 
Achille Pattavina, Stefano Gianatti 

August 1994 IEEE/ACM Transactions on Networking (TON), volume 2 issue 4 
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A program profile attributes run-time costs to portions of a program's execution. Most 
profiling systems suffer from two major deficiencies: first, they only apportion simple 
metrics, such as execution frequency or elapsed time to static, syntactic units, such as 
procedures or statements; second, they aggressively reduce the volume of information 
collected and reported, although aggregation can hide striking differences in program 
behavior.This paper addresses both concerns by exploiting the har ... 
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This paper presents a new mechanism for collecting and deploying runtime optimized code. 
The code-collecting component resides in the instruction retirement stage and lays out hot 
execution paths to improve Instruction fetch rate as well as enable further code 
optimization. The code deployment component uses an extension to the Branch Target 
Buffer to migrate execution into the new code without modifying the original code. No 
significant delay is added to the total execution of the program ... 
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tolerance on top of a connection oriented realtime service such as that provided by the 
Tenet scheme. A framework to study the dispersity schemes is presented. The simulations 
show that the dispersity schemes, by dividing the connection's traffic among multiple paths 
in the networic, have a beneficent effect on the capacity of the networl<. Thus, for certain 
classes of dispersity schemes, we obtain a small impr ... 
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This paper describes the implementation of an online feedbaclc-directed optimization 
system. The system is fully automatic; it requires no prior (offline) profiling run. It uses a 
previously developed low-overhead instrumentation sampling framework to collect control 
flow graph edge profiles. This profile information is used to drive several traditional 
optimizations, as well as a novel algorithm for performing feedback-directed control flow 
graph node splitting. We empirically evaluate this syst ... 

Keywords: adaptive optimization, dynamic optimization, online algorithms, virtual 
machines 
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In many garbage collected systems, the mutator performs a write barrier for every pointer 
update. Using generational garbage collectors, we study in depth three code placement 
options for remembered-set write barriers: inlined, out-of-line, and partially inlined (fast 
path inlined, slow path out-of-line). The fast path determines if the collector needs to 
remember thepointer update. The slow path records the pointer in a list when necessary. 
Efficient implementations minimize the instructions on ... 

Keywords: Java, copying collection, generational collection, write barriers 
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Optimization of a circuit by transistor sizing is often a slow, tedious and iterative manual 
process which relies on designer intuition. Circuit simulation is carried out in the inner loop 
of this tuning procedure. Automating the transistor sizing process is an important step 
towards being able to rapidly design high-performance, custom circuits. JiffyTune is a new 
circuit optimization tool that automates the tuning task. Delay, rise/fall time, area and 
power targets are accommodated. Each (weig ... 

Keywords: Circuits, transistor sizing, optimization, simulation, gradients. 
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In this paper we present a performance study of memory reference behavior in network 
protocol processing, using an Internet-based protocol stack implemented in thex-kernel 
running in user space on a MIPS R4400-based Silicon Graphics machine. We use the 
protocols to drive a validated execution-driven architectural simulator of our machine. We 
characterize the behavior of network protocol processing, deriving statistics such as cache 
miss rates and percentage of time spent waiting for memo ... 
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Profile data is valuable for identifying performance bottlenecks and guiding optimizations. 
Periodic sampling of a processor's performance monitoring hardware is an effective, 
unobtrusive way to obtain detailed profiles. Unfortunately, existing hardware simply counts 
events, such as cache misses and branch mispredictions, and cannot accurately attribute 
these events to instructions, especially on out-of-order machines. We propose an 
alternative approach, called ProfiieMe, that samples instructio ... 
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Recent work in history-based branch prediction uses novel hardware structures to capture 
branch correlation and increase branch prediction accuracy. Branch correlation occurs when 
the outcome of a conditional branch can be accurately predicted by observing the outcomes 
of previously executed branches in the dynamic instruction stream. In this article, we show 
how to instrument a program so that it Is practical to collect run-time statistics that indicate 
where branch correl ... 

Keywords: branch correlation, branch prediction, path profiling, profile-driven optimization 
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A major component of a parallel machine is its interconnection network, which provides 
concurrent communication between the processing elements. It is common to use a multi- 
stage interconnection network (MIN) which is constructed using crossbar switches and 
introduces not only contention for destination addresses but also additional contention for 
Internal switches, oth types of contention are increased when non-local communication 
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Instruction cache performance is important to instruction fetch efficiency and overall 
processor performance. The layout of an executable has a substantial effect on the cache 
miss rate and the instruction worlcing set size during execution. This means that the 
performance of an executable can be improved by applying a code-placement algorithm 
that minimizes instruction cache conflicts and improves spatial locality. We describe an 
algorithm for procedure placement, one type of code placement ... 

Keywords: code placement, conflict misses, temporal profiling, working-set optimization 
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A program's cache performance can be improved by changing the organization and layout 
of its data— even complex, pointer-based data structures. Previous techniques Improved 
the cache performance of these structures by arranging distinct instances to increase 
reference locality. These techniques produced significant performance improvements, but 
worked best for small structures that could be packed Into a cache block.This paper extends 
that work by concentrating on the internal organization off ... 

Keywords: cache-conscious definition, class splitting, field reorganization, structure 
splitting 
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Memory hierarchy performance has always been an important issue in computer 
architecture design. The likelihood of a bottleneck in the memory hierarchy Is Increasing, as 
improvements in microprocessor performance continue to outpace those made in the 
memory system. As a result, effective utilization of cache memories is essential in today's 
architectures.The nature of procedural software poses visibility problems when attempting 
to perform program optimization. One approach to increasing visibil ... 
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